Regex for Beginners ✳️
Published at Dec 22, 2024
What is Regex? 🤔
Regex is a tool for matching patterns in strings. The syntax for regex involves two forward slashes (/) with the pattern in between, followed by optional flags that modify its behavior.
/pattern/flags
Flags 🚩
Flags change how regex behaves:
- Case Insensitive (
i): Matches without considering case. - Global (
g): Finds all matches instead of stopping at the first.
/the/gi
Matches “the” in any case, globally.
Literal & Metacharacters 🔡🔣
Literal Characters 🔡
Regex can match literal characters for instance the regex:
cat Matches occurrences of “cat” in a string.
Metacharacters 🔣
Metacharacters are special characters with specific meanings:
*(wildcard): Matches zero or more occurrences..(dot): Matches any single character.
If you want to use the literal value of a metacharacter, escape it with a backslash (
\).
Quantifiers 🧮
Quantifiers specify how many times a pattern should match:
*: Zero or more times.+: One or more times.?: Zero or one time (optional).{n}: Exactlyntimes.{n,}: At leastntimes.{n,m}: Betweennandmtimes.
With * by matching zero times we can match an empty string. Because we literally “match” nothing
Examples 📝
matches
"a", "aa", "aaaaaaaaaaaaa" (many times) or an empty string
a* matches
"a", "aa", "aaa", or "aaaa"
a+ matches
"aa", "aaa", or "aaaa"
a{2,4} matches
"ha" or "hay"
hay? Greedy 🤑 vs. Lazy Matching 😴
- Greedy Matching: Matches as much as possible.
- Lazy Matching: Matches as little as possible by adding
?.
Examples 📝
When looking at the sentence:
The quick brown fox jumps over the lazy dog.
Greedy
matches
The quick brown fox jumps over the lazy do
T.*o Lazy
matches
The quick bro
T.*?o Bracket Expressions
Bracket expressions match specific characters:
[abc]: Matches “a”, “b”, or “c”.[a-z]: Matches any lowercase letter.[A-Z0-9]: Matches uppercase letters or digits.[^abc]: Matches anything except “a”, “b”, or “c”.
Example
[a-zA-Z]matches any letter.[0-9]matches any digit.
Character Classes
Shorthand for common patterns:
\d: Matches digits ([0-9]).\w: Matches word characters ([a-zA-Z0-9_]).\s: Matches whitespace.\D,\W,\S: Match the inverse.
Anchors
Anchors match specific positions in a string:
^: Start of a string.$: End of a string.\b: Word boundary.
Example
^Thematches “The” at the start of a string.end$matches “end” at the end of a string.
Groups and Alternation
- Capturing Groups: Use parentheses to group patterns.
- Example:
(fox|dog)matches “fox” or “dog”.
- Example:
- Alternation: Use
|for logical OR.- Example:
cat|dogmatches “cat” or “dog”.
- Example:
Lookaheads and Lookbehinds
- Lookahead: Matches based on what follows.
- Positive:
(?=...) - Negative:
(?!...)
- Positive:
- Lookbehind: Matches based on what precedes.
- Positive:
(?<=...) - Negative:
(?<!...)
- Positive:
Example
\d(?=px)matches digits followed by “px”.(?<=\$)\d+matches digits preceded by ”$“.
Escaping Special Characters
To match special characters literally, escape them with a backslash (\).
Example
\.matches a literal dot.\$matches a literal dollar sign.
Practical Example: Matching an IP Address
Regex
d{1,3}(.d{1,3}){3} Explanation
\d{1,3}: Matches 1-3 digits.\.: Matches a literal dot.{3}: Repeats the previous group 3 times.
Combining Concepts
To explicitly match an IP address:
Use
^and$to anchor the pattern.Example:
^d{1,3}(.d{1,3}){3}$
Conclusion
Regex is a powerful tool for pattern matching. By combining concepts like quantifiers, groups, and anchors, you can create complex patterns to solve real-world problems.
If you found this tutorial helpful, leave a like or comment with your favorite regex use case. Stay curious and keep learning!